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ABSTRACT: This report 
compared the performance of 
different classification algorithms 
such as decision tree, K-Nearest 
Neighbour (KNN), logistic 
regression, Support Vector 
Machine (SVM) and random 
forest. Dataset comprised 
smartphones’? accelerometer and 
gyroscope readings of the 
participants while performing 
different activities, such as 
walking, walking downstairs, 
walking upstairs, standing, sitting, 
and laying. Different machine 
learning algorithms were applied 
on this dataset for classification 
and their accuracy rates were 
compared. KNN and SVM were 
found to be the most accurate of 
all. 


INDEX TERMS: decision tree, 
Human Activity Recognition 
(HAR), K-Nearest Neighbour 
(KNN), logistic regression, 
random forest, Support Vector 
Machine (SVM)image 


I. INTRODUCTION 


Today’s smartphones are well- 
equipped with several sensors 
including motion detectors, such 
as accelerometer and gyroscope. 
Data generated by these motion 
sensors is used as input in 
different classification models. 
These models identify the type of 
activity in which the smartphone 
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user is involved, for instance, 


walking, standing, walking 
upstairs, walking downstairs, and 
sitting. 


In the recent past, the study of 
Human Activity Recognition 
(HAR) has gained popularity 
among researchers. With the right 
and precise information about 
users' activities, behaviors and 
interests, the scope of various 


applications can be further 
widened in different fields 
including medicine, security, 


entertainment, health, homecare 
systems, prisoner monitoring, 
physical therapy, and 
rehabilitation, among others. 
HAR has been an active field of 
interest and research for almost a 
decade. It is mainly about 
analyzing user activities and 
interpreting the ongoing events 
accurately. Considerable efforts 
have been carried out to improve 
the user experience with mobile 
devices by improving their 
performance through high 
accuracy. Classification models 
play a vital role in it. 

The main aim of HAR systems is 
to examine human activities and 
to interpret ongoing events, 
successfully. In this project, 
different classification algorithms 
were compared to find out which 
model worked best in HAR. The 
following five classification 
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models were selected for 

comparison: 

1. Decision Trees 

11. K-Nearest Neighbour (KNN) 

111. Logistic Regression 

iv. Support Vector Machine 
(SVM) 


v. Random Forest 
II. RELATED WORK 


Charlene and Nestor explained 
the basic HAR in smartphones. 
They proposed that recognition 
performance of gyroscope, 
accelerometer, and magnetometer. 
Sensor fusion was inspected to gain 
valuable insights on feature level. 
These insights assisted in data 
collection and dynamic sensing. 
Six activities namely, running, 
walking, standing, sitting, walking 
downstairs, and walking upstairs 
were incorporated from low sensor 
data. Data was gathered based on 
subjects and feature selection was 
carried out to optimize the resource 
use. In this study, KNN algorithm 
and decision tree were used for 
classification. The findings 
suggested that in contrast to 
decision tree, KNN is better in 
terms of performance[1].This paper 
compared the advantages and 
disadvantages of five algorithms, 
CNN, LSTM, BLSTM, MLP and 
SVM, in the recognition of human 
behavior [3]. 
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Lara and Miguel collected data 
about HAR using wearable sensors. 
They surveyed and qualitatively 


compared twenty-eight systems 
with respect to the learning 
approach, response time, 


flexibility, obtrusiveness, and some 
other design issues. The basics of 
feature extraction and machine 
learning were also included 
because they are crucial for every 
HAR system. Lastly, numerous 
ideas were presented for future 
research to outspread as mentioned 
in [2]. 

This paper compared the 
advantages and disadvantages of 
five algorithms, CNN, LSTM, 
BLSTM, MLP and SVM, in the 
recognition of human behavior [3]. 
Modern smartphones and 
wearables often contain multiple 
embedded sensors which generate 
significant amounts of data. This 
information can be used for body 
monitoring-based areas such as 
healthcare, indoor location, user- 
adaptive recommendations and 
transportation [4]. 

This research proposed a CNN- 
LSTM approach to human activity 
recognition that seeks to improve 
the accuracy of activity recognition 
by leveraging the robustness in 
feature extraction of a CNN 
network while taking advantage of 
the work an LSTM model does for 
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time series forecasting and 
Classification [5]. 
Smartphones increasingly 


becoming ubiquitous and being 
equipped with various sensors, 
nowadays, there is a trend towards 
implementing HAR (Human 
Activity Recognition) algorithms 
and applications on smartphones, 
including health monitoring, self- 
managing system and fitness 
tracking [6]. 

In this paper, convolutional layers 
are combined with long short-term 
memory (LSTM), along with the 
deep learning neural network for 


human activities recognition 
(HAR). The proposed model 
extracts the features in an 


automated way and categorizes 
them with some model attributes. 
In general, LSTM is an alternative 
form of recurrent neural network 


(RNN) which is famous for 
temporal sequences’ processing 
[7]. 

Data annotation is a time- 


consuming process posing major 
limitations to the development of 
Human Activity Recognition 
(HAR) systems. The availability of 
a large amount of labeled data is 
required for supervised Machine 
Learning (ML) approaches, 
especially in the case of online and 
personalized approaches requiring 
user specific datasets to be labeled 


[8]. 
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Powerful algorithms are required to 
analyze these heterogeneous and 
high-dimension streaming data 
efficiently. This paper proposes a 
novel fast and robust deep 
convolutional neural network 
structure (FR-DCNN) for human 
activity recognition (HAR) using a 
smartphone. It enhances the 
effectiveness and extends the 
information of the collected raw 
data from the inertial measurement 
unit (IMU) sensors by integrating a 
series of signal processing 
algorithms and a signal selection 
module. It enables a fast 
computational method for building 
the DCNN classifier by adding a 
data compression module [9]. 

Human Activity Recognition 
(HAR) can be defined as the 
automatic prediction of the regular 
human activities performed in our 
day-to-day life, such as walking, 


running, cooking, performing 
office work, etc[10]. 
HI. DATASET 


The dataset used for this project 
was borrowed from UCI ML 
Repository. This HAR dataset 
was developed by taking the 
recordings of 30 participants 
while carrying out the actions of 
routine living (ARL), though 
holding an attached smartphone 
using implanted inertial sensors. 


Each person performed six 
jie 
(ony E 
b> : (4g 
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activities (walking, downstairs Test(X) = test.data.drop 
walking, upstairs walking, laying, (['Activity’, 'subject'],axis=1) 
standing, and sitting) while Test(Y) = test.data [" Activity"] 
carrying a smartphone. This is 
done by using embedded 
gyroscope and accelerometer, 
which captured the angular 
velocity of 3-axial at a constant 
rate having 50Hz as linear 
acceleration of 3-axial. 

This data set carried the following 
readings for each record: 

o The estimated body 
accelerometer and total 
acceleration as _ triaxial 
acceleration form.The : 
angular (triaxial) velocity «= $ , ; 
taken from the gyroscope. p 5 i i Í 

o A feature vector withthe ©" o č ṣ āãěć ġ 
frequency and time domain - ij aa 


8 
6 


g 
6 


è 
8 
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Performed 


variable of 561. 


IV. EXPLORATORY Fig.2. Features Distribution 
ANALYSIS 


WALING WALKING DOWNSTAIRS WALKING UPSTAIRS 


Data Preparation 

Data set was split into two 
different files for training and 
testing data. Exploratory analysis 
was performed on the data set to 
understand the features and 
activities. 

Training Data 


Train(X) = train. data.drop 
({'Activity', 'subject'], 





Fig. 3. Activities Performed by 


. . . Percentage 
Axis=1) train(Y) = train. 8 
Data ["Activity"] From the above treatment of 
Test Data data, it is clear that all classes are 


of approximately same size. This 
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fact 1s important in order to 
ensure that the machine learning 
algorithms are not biased towards 
any one of the given classes. 


V. METHODOLOGY 


Machine learning methodologies 
were applied on the HAR dataset 
using different classification 
models - as explained below. 

A. DECISION TREE 

A decision tree 1s an appropriate 
representation model used for the 
classification of particular 
samples. It is a supervised 
machine learning approach in 
which the data is constantly split 
by approximating it to a definite 
parameter. 

The decision tree algorithm was 
applied on training data with a 
depth of ‘8’. After fitting the 
model on the training set, the 
accuracy on test data (test.csv 
file) was found to be 88%, which 
is reasonably accurate. 

For improving the accuracy, 
cross-validation method was also 
applied and the result was 87%, 
which is slightly lower than the 
original. Then, the data was split 
into three different portions: 
training (60%), validation (20%) 
and testing (20%). A new model 
was trained the model on split 
data. The scores of the model on 
validation data and test data were 
checked and their accuracy was 
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found to be 94% - better than the 
initial model. 
B. K-NEAREST NEIGHBOUR 
(KNN) 
KNN classifies objects based on 
the closest sample from the 
training data. It is considered to 
be the simplest of all machine 
learning classification algorithms. 
KNN was applied on the training 
data set and the n_neighbors to 10 
after training. The accuracy of the 
test set was 90.6%. 
Same splitting was repeated to 
apply on KNN, train, validate and 
test. After training the model, 
accuracy was improved to 96.6% 
C. LOGISTIC REGRESSION 
Logistic regression is a Statistical 
learning method used in 
supervised machine learning. It 1s 
mostly used for classification 
tasks. 
It was applied to the training set. 
After training the model, the 
score of the test data was 99%. 
However, logistics regression, by 
default, is limited to a two-class 
classification. Therefore, its 
results have not been incorporated 
in the final conclusion. Although 
some extensions, such as one vs 


others, can make logistic 
regression useful for solving 
multi-class classification 


problems, it is beyond the scope 
of this study. 


Ds 
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D. SUPPORT VECTOR 
MACHINE (SVM) 

Another very popular machine 

learning approach namely 


Support Vector Machine (SVM) 
finds a hyperplane in an N- 
dimensional space (N — the no. 


of features) that clearly 
categorizes the data points. SVM 
finds a hyperplane with a 


maximum margin, that is, the 
maximum distance among the 
data points of the classes. 

SVM was applied on the training 
data. The hyper-parameters were 
set at C=1.0 and gamma = scale. 
After training the SVM model, 
the accuracy of the test set was 
93%. 

Repeating the same splitting to 
apply on SVM, train, validate and 
test. After training the model, a 
slight improvement was observed 


with an accuracy of 96.8%. 
VI. RANDOM FOREST 


It consists of a large number of 


individual decision trees that 
work as an ensemble. Each 
decision tree output in this 
classifier makes a class 


prediction. The class with the 
majority of votes serves as the 
prediction of the model. 


After SVM, random forest was 
applied. The accuracy score of 
this model was significantly 
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better (92%) as compared to the 
decision tree model. 


VII. CONCLUSION 


In this project, several machine 
learning algorithms were used to 
create predictive models for the 
classification of human activities 
based on the dataset obtained 
from smartphone sensors. Five 


different machine learning 
techniques were used for 
classification (KNN, logistic 


regression, decision tree, SVM, 
and random forest) and their 
accuracy rates were compared. 


Overall, all models performed 


very well on this dataset. 
However, K-Nearest Neighbor 
(KNN) and Support Vector 


Machine (SVM) were found to 
have the highest accuracy among 
all, that is, 96.6% and 96.8%, 
respectively. 


For future work, the same data 
can also be used to predict the 
subjects (people) as well as using 
the machine learning models to 
see the patterns of the sensory 
data of different people. 
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